Study and Simulation of a Distributed Real-time Fault-tolerance Web Monitoring System
نویسندگان
چکیده
The goal of this project is to study and simulate a distributed real-time fault-tolerance web monitoring system. The method of providing fault-tolerance is to schedule multiple copies of a task on different computer nodes in a distributed computing system. A fault-tolerant system automatically recovers from a specified number of failures. If the primary task cannot be completed due to a fault, the scheduled backup task is run and all tasks are assured to complete. We use the web to monitor the fault-tolerant behavior of a distributed system. A web monitoring system is a convenient way to monitor remote tasks, both primary and backup. Our simulation results show that it is possible to set up a distributed real-time fault-tolerance web monitoring system. To achieve our goal quickly, we use the existing the Ganglia network and the RRDTool technology. We found that we can use the Ganglia system to set up a distributed real-time fault-tolerance web monitoring system. The RRDTool is only for simulation purposes. It is used to show the results of the simulation graphically, but it can’t accurately store the status data of the tasks. We will use the MySQL to store the status of the real time tasks instead of the RRDTool in further research and to yield more accurate results within the time constraints.
منابع مشابه
Building and Verifying Fault-Tolerant Autonomous Real-Time Systems for Space Applications
NASA missions require autonomous systems that perform correctly for an extended period of time. These systems must make real-time decisions in logical sequence that meet timing require ments. These systems must anticipate faults induced by environmental change, but it is difficult to anticipate the infinite variety of situations one must encounter for the design of robotic explorers, spacecraf...
متن کاملA generalized ABFT technique using a fault tolerant neural network
In this paper we first show that standard BP algorithm cannot yeild to a uniform information distribution over the neural network architecture. A measure of sensitivity is defined to evaluate fault tolerance of neural network and then we show that the sensitivity of a link is closely related to the amount of information passes through it. Based on this assumption, we prove that the distribu...
متن کاملRuntime Verification for Ultra-Critical Systems
Runtime verification (RV) is a natural fit for ultra-critical systems, where correctness is imperative. In ultra-critical systems, even if the software is fault-free, because of the inherent unreliability of commodity hardware and the adversity of operational environments, processing units (and their hosted software) are replicated, and fault-tolerant algorithms are used to compare the outputs....
متن کاملTowards Predictable CORBA-Based Web-Services
Over the past several years, the World Wide Web has emerged from a research project to an environment for open, commercial services, such as online-banking, travel reservation, and stock-trading. However, in contrast to the best-effort approach pro vided by the Web, many of those services demand higher predictability and qualityof-service properties such as security, end-to-end availability, de...
متن کاملScheduling Simulation in a Distributed Wireless Embedded System
The aims of the research are to develop a distributed simulation environment and to investigate techniques that support efficient task scheduling algorithms in fault-tolerant, real-time, distributed, and wireless embedded systems. Techniques we developed include deadline-based real-time scheduling, priority-based scheduling, redundant resource allocation for fault-tolerance, energy-aware, and s...
متن کامل